It Works On My Machine - Reproducibility in R for Small Teams

R Gov 2023
October 19, 2023

Melissa Albino Hegeman

Works on my machine - MS Teams sticker

Microsoft Teams Sticker

About Me

  • Marine biologist

  • Get really seasick
  • Work with fisheries data

Disclaimer

  • I work for NYSDEC, but the opinions I’m presenting are my own and don’t reflect agency policy.

  • I generated the images with Adobe FireFly

How I Am Using R

:::: {.columns}

  • Automate routine tasks
  • Generate individualized reports

What is a Team?

  • 10 people of less
  • Limited to no experience with R
  • No enterprise tools

Sharing the Load

  • What happens after you implement a big change?
  • Who is responsible for maintenance?
  • Who is responsible for new features?

Setting Up for Success

  • R projects
  • GitHub
  • Custom R package
  • renv

It Works on My Machine

  • Error in library(tidyverse) : there is no package called ‘tidyverse’

  • Error in plot(data) : object 'data' not found

  • Error in file(file_path, "r") : cannot open the connection

  • cannot create dir 'output', permission denied

How Do I Fix It?

  • R projects
  • GitHub
  • Custom R package
  • renv

R Projects

Issues

  • Not being used as intended

Successes

  • Relative file paths
  • First step in reproducibility
  • Adds portability

GitHub Repositories

Issues

  • Steep learning curve

Success

  • It’s the most efficient way to get the code on everyone’s machine

  • Gives team members the freedom to experiment

Custom R Package

Issues

  • Keeping everything in sync

  • Updates and maintenance

Successes

  • Everyone is applying the same treatment to the data no matter where they are working

  • Dependent on staff making sure their version was updated regularly

renv

:::: {.columns}

::: {.column}

Issues

  • Slow to boot up a project for the first time
  • Staff were updating the lockfile rather than adjusting their installed packages when their project was out of sync

Successes

  • This is still a work in progress

  • Wait until you are done developing before you initialize renv

  • Forced me to minimize the amount of dependencies I rely on

Solutions

  • Consistent and continued training for new staff

Next Steps

Containers

::: notes

Everything I’ve talked about so far is from the perspective of a small group without significant IT resources to throw around. However, there is one strategy I’ve shied away from in the past because I’ve associated it with larger IT operations: containers. I’ve been experimenting with containers to have even more control over how team members are coding in R. The examples I’ve talked about to this point involved people running code already written. I want to see other team members start adding features to these projects, and having a consistent environment from the start should help avoid some of the problems I’ve discussed today.

Wrap Up

Thank You

Melissa Albino Hegeman

melissa.hegeman@gmail.com https://www.linkedin.com/in/melissaalbinohegeman/ https://github.com/mhegeman/2023_rgov